The domain that we are focusing on is Twitter data as it is a highly popular social media platform that gives us an insight to individuals mindset through short text summaries called tweets. These individuals include influential public figures such as senators, as well as, any person that chooses to use the platform. This gives us data on social media usage in a standardized format that will allow us to draw conclusions about a wide variety of individuals. Twitter is also an optimal choice because it is a primarily text based allowing us to analyze information rather than missing a whole subsection of context such as Instagram’s focus on pictures.
For the senators dataset, the length of this data is 10. The average number of replies for all of those tweets is 41.9017584. We also analyzed the maximum and minimum number of favorites those tweets received, and the highest number of favorites is 2108865, while the lowest number of favorites is just 0. The largest number of retweets a tweet recieved is 3644423, and the fewest number of retweets is also 0. At last, we analyzed which party posted most tweets in this period, and that is D, which represents the democratic party.
For the russian trolls dataset we used, the length of this data is 21. We focused on the users more this time. The average of number of followers those users have is 2256.3982189, and they follow 2008.342079 people in average. The user who has the highest number of followers has 23890 followers, but the user who has the lowest number of followers doesn’t have any followers, with a number of 0. The most committed user, the person who posted the most tweets, is AMELIEBALDWIN.
Last but not least, for the sentimental dataset, the length of this data is 6, and the columns are: target, id, date, flag, user, text. There are a total number of 100002 for tweets that indicate positive mood, and there are 299998 negative mood tweets. We also analyzed the most prefered and least prefered day in a week to post tweets in general, and the most popular day is Tue, while the least prefered day to post tweets is Sun.
| Number of Hashtags | Average Number of Retweets |
|---|---|
| 0 | 373.56405 |
| 1 | 113.42351 |
| 2 | 72.83565 |
| 3 | 58.58104 |
| 4 | 57.03279 |
| 5 | 57.13406 |
| 6 | 55.04412 |
| 7 | 15.50000 |
| 8 | 499.00000 |
| 9 | 1.00000 |
From the table we can see that there is a negative relationship between the number of hashtags in a tweet and its average number of retweets in general (except for the number of 8, which can be seen as an outlier). We employ this table to give a sense of how different number of hashtags influence the corresponding average number of retweets in an intuitional way.
| Regions | Number of Tweets |
|---|---|
| United States | 159369 |
| United Arab Emirates | 9598 |
| Italy | 6278 |
| Azerbaijan | 6238 |
| Russian Federation | 1295 |
| Ukraine | 1232 |
| Israel | 675 |
| Germany | 656 |
| United Kingdom | 600 |
| France | 22 |
| Iraq | 17 |
| Turkey | 9 |
| Japan | 2 |
| Serbia | 2 |
| Egypt | 1 |
From this table we can directly see each region’s total number of tweets in a descending order. The U.S. has the largest number of qualified tweets in this period and Egypt has the lowest number. We choose to employ this table because it is able to provide information about different countries’ number of tweets in an extremely clear way.
| Is the Mood Positive? | Number of Tweets |
|---|---|
| FALSE | 299998 |
| TRUE | 100002 |
Table obtained from this dataset is a bit hard to compute. However, this table can present a direct illustration about the total number of tweets that are in positive mood as well as the total number of tweets that are in negative mood. It is meaningful because this table provides some deep idea about tweets, or social media in general, which is people’s tendency to post something negative rather than positive.
## 0% 25% 50% 75% 100%
## 0.00000 6.40000 11.90000 18.81667 23.98333
## 0% 25% 50% 75% 100%
## 0.000000 4.483333 10.283333 18.333333 23.983333
*Note: After the box plot, the first box is the statistics for the negative mood plot and the second box is the statistics for the positive mood.
The reason why a box plot is good at displaying how time affects the mood is because it shows the median hour, all the quartiles, minimum, and maximum values which should provide quick statistics for any reader who just wants the quick result. Based on the box plot and the summary below, the median for negative mood tweets is at hour 11.9 (11:54) and the median for positive mood tweets is at hour 10.2833333 (10:17). The box plot for negative mood is skewed to the later time in the day which could indicate that due to a long work day or school day, people tend to be in a negative mood. The opposite is expected for positive mood as people are just freshed out of bed.
A pie chart is suitable to present this data since it shows which day has the most number of tweet out of the total number of tweet created in the whole week. We can see that the data is somewhat equally distributed in the chart, sometimes increased slighly in weekend and midweek. This could be explained by the way people easily gain access to twitter, which is through personal devices like smartphones or computer.